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Computing  Minimal  Hitting  Sets  with  Genetic 

Algorithm 


Lin  Li1  2 and  Jiang  Yunfei1 


Abstract.  A set  S that  has  a non-empty  intersection  with  every 
set  in  a collection  of  sets  C is  called  a hitting  set  of  C.  If  no 
element  can  be  removed  from  S without  violating  the  hitting  set 
property,  S is  considered  to  be  minimal.  Several  interesting 
problems  can  be  partly  formulated  as  ones  that  a minimal 
hitting  set  or  more  ones  have  to  be  found.  Many  of  these 
problems  are  required  for  proper  solutions,  but  sometimes  the 
approximate  solutions  are  enough.  A genetic  algorithm  and 
advantaged  algorithms  were  devised  for  computing  minimal 
hitting  sets.  An  improvement  makes  them  get  most  minimal 
hitting  sets  efficiently.  Furthermore,  they  are  smaller,  i.e.  fewer 
rules. 


1 INTRODUCTION 

A lot  of  theoretical  and  practical  problems,  e.g.,  [1~8],  can  be 
partly  reduced  to  an  instance  of  the  minimal  hitting  set  or  one 
of  its  relatives,  such  as  the  minimum  set  cover  problem,  model- 
based  diagnosis  [1~5,7~8],  and  teachers  and  courses  problem. 

Normally  speaking,  it  is  a problem  of  selecting  a minimal  set 
(e.g.,  of  teachers)  that  has  a non-empty  intersection  with  each 
set  (e.g.,  list  of  courses),  That  is  to  say,  there  is,  at  least,  one 
teacher  who  can  teach  any  courses,  This  is  a formulation  of  the 
minimal  hitting  set  problem,  which,  in  general,  is  NP-hard  [6], 

Generally,  there  are  a number  of  hitting  sets,  but  sometimes 
we  only  need  one  or  some  of  them.  There  are  some  algorithms 
[1~8]  for  computing  all  of  the  minimal  hitting  sets,  the  space 
and  time  efficiency  are  not  ideal.  We  present  a novel  method 
based  on  the  Genetic  Algorithm  (in  short  GA  here)  for 
calculating  minimal  hitting  sets. 

Definition  1.  (Hitting  sets) 

Given  a collection  C={5;  | /'CN  } of  sets  of  elements  from 
some  universe  U,  a hitting  set  is  a set  S c U such  that  SITS)  #0 
, for  all  i,  i.e.,  a set  which  contains,  at  least,  one  element 
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front  all  sets  in  C.  Let  HS(Q  denote  the  collection  of  all  hitting 
subsets  in  HS(C).  These  are  called  the  minimal  hitting  sets  of 
C. 

We  introduce  a minimizing  operator  fl  [5],  MHS(C)=  fl 
(HS(C)).  We  will  use  fl  to  get  minimal  confl ict(/hitting)  sets 
front  conflict(/hitting)  sets. 

Determining  a minimal  cardinality  element  of  MHS(C)  is 
called  the  minimal  hitting  set  problem. 

Example  1.  Model-based  diagnosis  [1],  as  shown  in  Figure 
1.  Suppose  conflict  sets  are  {Ml,  M2,  Al},  {Ml,  Al,  A2,  M3}. 
The  minimal  hitting  sets  (diagnosis)  are  {Ml},  {Al},  {M2, 
A2},  {M2,  M3}. 

{Ml},  { Al } are  of  minimal  cardinality. 


Figure  1 A simple  circuit  with  3 multipliers  and  2 adders. 

A minimal  cardinality  hitting  set  is  a minimal  hitting  set  of 
minimal  cardinality. 

In  case  of  large  sets  of  conflicts,  the  computation  of  the 
hitting  sets  will  result  both  time  and  space  consumption.  Shown 
in  Figure  2. 

There  are  about  millions  of  components,  For  example,  in 
vehicles,  computer  systems,  power  plants,  aircrafts,  etc,. 
Therefore,  we  developed  a novel  efficient  GA  to  compute 
minimal  hitting  sets.  When  the  scale  of  conflicting  sets  is 
getting  large,  the  GA  method  can  still  be  used  for  computing 
the  minimal  hitting  sets  in  a very  short  time. 


2 GENETIC  ALGORITHM 

Genetic  algorithm  is  a heuristic  for  the  function  optimization, 
where  the  extreme  of  the  function  (i.e.,  minimal  or  maximal) 
cannot  be  analytically  established,  A population  of  potential 
solution  is  refined  iteratively  by  employing  a strategy  inspired 
by  Darwinist  evolution  or  natural  selection.  Genetic  algorithms 
promote  “survival  of  the  fittest”.  This  type  of  heuristic  has  been 
applied  in  many  different  fields,  including  construction  of 
neural  networks  and  multi -disorder  diagnosis. 

For  the  minimal  hitting  set  problem,  a straightforward  choice 
of  population  is  a set  P of  elements  from  2U,  encoded  as  bit- 
vectors,  where  each  bit  indicates  the  presence  of  a particular 
element  in  the  set. 

Example  2,  (Teacher  and  course  problem)  Let  C denote  a set 
cluster  containing, 

5,={  1,  2,  3,  4},  S2={  1,  2,  4},  S3={  1,  2},  S4={ 2, 3},  S5={4}. 

It  means  that  there  are  5 courses  {5b  S2 , S3,  S4,  Ss  } and  4 
teachers  1,  2,  3,  4.  Teachers  1,  2,  3 and  4 can  teach  course  Si, 
teachers  1,  2,  4 can  teach  course  S2,  ...  , teacher  4 can  teach 
course  S5.  We  want  to  find  the  least  teachers  who  can  teach  all 
of  the  5 courses.  This  is  a minimal  hitting  sets  problem,  and  the 
minimal  hitting  sets  are:  H\={  1,  3,  4},  H2={ 2, 4}. 

We  use  bi-vectors  to  represent  the  sets  and  their  hitting  sets, 
these  bi-vectors  are  called  “chromosomes”,  each  bit  is  called 
“gene”,  and  all  of  the  “chromosomes”  are  called  “population”. 

If  we  use  chromosome  to  represent  the  sets,  they  are 
represented  as  follow: 

5,={1,  1,  1,  1},S2={1,  1,  0,  1},S3={1,  1,  0,  0},  54={0,  1.  1, 
0},  S5={0,  0,  0,  1}. 

The  hitting  sets  are:  Hi={  1,  0,  1,  1 },  H2={ 0,  1,  0,  1 }. 

Here,  | 5,|S|CS,j,  | so,  the  length  of  chromosomes 

equals  to  | CSy-|. 

Genetic  operations  include:  “crossover”,  “mutation”, 

“inversion”,  “selection”  and  “obtain”. 

Suppose  that  minimal  conflict  sets  cluster  is  C={5,,  S2,  ...  , 
S„},  n=jL5t|. 

“Crossover”  operator.  Suppose  that  Si={.Sn,  .V|2,  ...  , .Vj„}, 
S2={.s'2i,  s22,  ...  , s2n},  are  two  chromosomes,  select  that  a 
random  integer  number  0 <r<n,  S3,  S4  is  offspring  of 
crossover^,  S2), 

S3={si  | if  i<r,  stLSn  else  s,-  C,S2}, 

S4={Si  | if  Hr,  SiLSi,  else st  US,}. 

“Mutation”  operator.  Suppose  that  a chromosome  £i={sn, 
.s'i2,  ....  ,?]„},  selecting  a random  integer  number  0 <i<n,  S3  is 
mutation  of  Si, 

S3={s,  | if  itr,  then  s,=Su,  else  .v,  = 1 -.y i ,} . 

“Inversion”  operator.  Suppose  that  chromosome  S'i={sii, 
.S'  12,  ...  , Sir  , s i , r h » •••  , Si.rH,  Si/+M>  ...  , .tin  },  r,  I are  random 
numbers,  S2  is  the  inversion  of  Si. 

•S2=G'i1>  S 12,  ...  , Sir  . *tl, r+b  ■■■  i Sl,r4-1>  ‘V1,H7+1>  - In  }• 

“Selection”  operator.  Suppose  that  there  are  m sets,  we 
select  [m/2]  sets  and  eliminate  other  sets,  the  sets  we  selected 
are  both  “fitness”  and  “minimal”,  i.e.  first,  they  intersect  more 
sets  than  the  other,  and  second,  their  cardinality  is  smaller. 


“Obtain”  operator.  Suppose  that  there  is  a singleton  set  in 
the  set  cluster,  then  all  hitting  sets  must  hits  this  set,  i.e.  the 
gene  stands  for  this  set  must  be  always  kept  as  “1”  , we  refer  to 
this  operator  as  “obtain”: 

“Obtain”  operator  has  no  any  influence  on  the  result,  it  can 

improve  the  efficiency,  such  as  a giraffe  obtains  “long  neck”. 

So  they  can  be  competed  under  the  “ long  neck”  condition. 

Genetic  algorithm. 

1.  InitializePopuiation:  Obtain  /r*|C]*jC5,j  population 

randomly,  each  chromosome  is  an  n-length  array,  k is  a const 
coefficient. 

2.  Testing  if  one  of  the  stopping  criteria  (time,  fitness,  etc) 
holds.  If  it  is  yes,  the  procedure  can  be  stopped,  here,  100 
generations  are  gotten 

3.  Selection:  Selecting  one  of  chromosome;  testing  its 
fitness,  here,  being  the  number  of  sets  it  hits.  Keeping  the  most 
fitness  ones  and  deleting  the  bad  ones. 

4.  Applying  the  genetic  operator:  such  as  “crossover”, 
“mutation”,  “inversion”  and  “obtain”  to  the  selected  parents  to 
form  offspring. 

5.  Recombining  the  offspring  and  current  population  to  form 
a new  population  with  “selection”  operator. 

6.  Repeating  steps  2-5. 

Also,  we  can  use  Genetic  Algorithm  to  compute  MINIMAL 
hitting  sets  from  hitting  sets. 

In  step  3.  If  we  get  hitting  sets,  we  can  undergo  mutation 
operator  just  to  change  sr  from  “1”  to  “0”  in  order  to  get  its 
offspring,  else,  we  undergo  mutation  operator  just  to  change  sr 
from  “0”  to  “1”  in  order  to  get  its  offspring.  In  the  next 
selection  operator  we  will  go  on  keeping  hitting  sets  because 
they  are  more  fitting. 

In  the  end,  we  will  get  4 sets  as  follow: 

1.  Minimal  hitting  sets; 

2.  Both  minimal  hitting  sets  and  their  super-hitting  sets;  we 
will  use  operator  p to  delete  the  super-hitting  sets; 

3.  Hitting  sets,  but  not  minimal,  their  sub-hitting  sets  are  not 
gotten; 

4.  No  hitting  sets,  these  sets  will  be  deleted  by  “selection” 
operator. 

But,  in  fact,  the  situation  3 is  never  gotten  by  GA  test 
program. 

We  can  get  about  95  percent  minimal  hitting  sets  with  GA. 
(shown  in  Figure  2) 

3 COMPARISON 

We  have  written  a program  to  compare  among  HS-tree,  BHS- 
tree  [8]  and  GA;  the  result  is  shown  in  Figure  3 and  Figure  4. 
The  elements  of  every  conflict  sets  are  between  1 and  20. 

In  general,  GA  can  get  more  than  95  percent  minimal  hitting 
sets  in  100  generations,  when  the  set  cluster  is  big,  then  the  HS- 
tree  and  BHS-tree  can  not  run  because  of  “Out  of  memory”, 
but,  GA  can  get  almost  all  minimal  hitting  sets  efficiently. 

The  space  complexity  of  HS-tree  is  about  0(m"),  m is  the 
average  of  |S,-|,  n is  |C|,  That  of  BHS-tree  is  about  0(  2|Ui/l  ), 


that  of  GA  is  about  0(n|IIS;|).  So  the  efficiency  of  GA  is  better 
than  that  of  HS-tree  and  BHS-tree. 
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Figure  2 Running  time  among  BHS-tree,  HS-tree  and  GA. 
(CPU-PII 667,  1 28M  main  memory,  C++,  Windows’98) 
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Figure  3 The  hitting  sets  number  and  the  percentage  of  GA 
gets. 

4 CONCLUSIONS 

In  this  paper,  we  have  improved, 

1 . When  the  conflict  sets  scale  gets  big,  This  GA  algorithm 
may  get  most  of  minimal  hitting  sets  in  a relative  short  time 
and  small  memory,  but  the  other  algorithm  can’t  get  the  hitting 
sets  because  of  “out  of  memory”. 

2.  The  GA  algorithm  can  also  get  MINIMAL  hitting  sets.  If  a 
chromosome  is  not  a hitting  set,  and  the  “mutation”  operator 
just  changes  a random  gene  from  “0”  into  “1”,  else  change  a 
random  gene  form  “1”  into  “0”  so  that  we  can  get  minimal 


hitting  set. 

Example  3.  (Continue  to  Example  2) 

If  we  get  Hy={  1,  1,0,  1}  and  know  that  it  is  a hitting  set, 
then  we  undergo  “mutation”  operator  to  it,  however,  we  only 
change  “1”  into  “0”  here. 

IIy={  1,  1,  0,  1 }— >{0,  1,0,  1 },  (minimal  hitting  set) 

— >■{  1,  0,  0,  1 },  (no  hitting  set) 

— 1,  1,  0,  0}.  {no  hitting  set} 

Underlined  genes  stand  for  “mutation”  from  parent  genes. 

3.  Although  this  algorithm  can’t  get  all  of  the  minimal  hitting 
sets,  but  after  we  replace  or  repair  these  components  we  have 
computed,  we  can  do  next  diagnosis  step  by  step.  The  next 
research  is  GA  used  in  choice  of  a repair/replace  action  on  the 
set  of  suspects  or  choice  of  a next  measurement. 

This  GA  can  be  used  in  many  other  fields,  e.g.  a librarian  can 
decide  what  kind  of  journals  referred  by  researchers  will  be 
purchased  under  lack  of  funds.  [6,  pp!24], 
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